Unsupervised Learning of Paraphrases

نویسندگان

  • João Cordeiro
  • Gaël Dias
  • Pavel Brazdil
چکیده

Paraphrasing constitutes a corner stone in many Natural Language Processing fields like monolingual text-to-text generation and automatic text summarization. Indeed, aligned monolingual corpora are likely to boost the learning process of text-to-text generation models. A Paraphrase learning strategy can be defined as a two-step process: (1) identifying and extracting related sentence pairs from on-line comparable corpora (for example sentences that convey the same information but yet are written in different forms) and (2) applying learning methodologies over the extracted material to induce text-to-text rewriting rules. In this paper, we compare different lexical distance metrics for the identification of related sentences, i.e. paraphrase candidates. In particular, we discuss how different metrics lead to the identification of different types of paraphrases. Finally, the comparisons and discussions give relevant insights towards automatic generation of paraphrase corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Paraphrases from a Parallel Corpus

While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as sy...

متن کامل

Learning Paraphrases to Improve a Question-Answering System

In this paper, we present a nearly unsupervised learning methodology for automatically extracting paraphrases from the Web. Starting with one single linguistic expression of a semantic relationship, our learning algorithm repeatedly samples the Web, in order to build a corpus of potential new examples of the same relationship. Sampling steps alternate with validation steps, during which implaus...

متن کامل

Unsupervised Metaphor Paraphrasing using a Vector Space Model

We present the first fully unsupervised approach to metaphor interpretation, and a system that produces literal paraphrases for metaphorical expressions. Such a form of interpretation is directly transferable to other NLP applications that can benefit from a metaphor processing component. Our method is different from previous work in that it does not rely on any manually annotated data or lexic...

متن کامل

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not ...

متن کامل

Learning Paraphrases from WNS Corpora

Paraphrase detection can be seen as the task of aligning sentences that convey the same information but yet are written in different forms. Such resources are important to automatically learn text-to-text rewriting rules. In this paper, we present a new metric for unsupervised detection of paraphrases and apply it in the context of clustering of paraphrases. An exhaustive evaluation is conducte...

متن کامل

Paraphrase Alignment for Synonym Evidence Discovery

We describe a new unsupervised approach for synonymy discovery by aligning paraphrases in monolingual domain corpora. For that purpose, we identify phrasal terms that convey most of the concepts within domains and adapt a methodology for the automatic extraction and alignment of paraphrases to identify paraphrase casts from which valid synonyms are discovered. Results performed on two different...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007